7 research outputs found
Fast Cross-Validation via Sequential Testing
With the increasing size of today's data sets, finding the right parameter
configuration in model selection via cross-validation can be an extremely
time-consuming task. In this paper we propose an improved cross-validation
procedure which uses nonparametric testing coupled with sequential analysis to
determine the best parameter set on linearly increasing subsets of the data. By
eliminating underperforming candidates quickly and keeping promising candidates
as long as possible, the method speeds up the computation while preserving the
capability of the full cross-validation. Theoretical considerations underline
the statistical power of our procedure. The experimental evaluation shows that
our method reduces the computation time by a factor of up to 120 compared to a
full cross-validation with a negligible impact on the accuracy
Estimating Local Function Complexity via Mixture of Gaussian Processes
Real world data often exhibit inhomogeneity, e.g., the noise level, the
sampling distribution or the complexity of the target function may change over
the input space. In this paper, we try to isolate local function complexity in
a practical, robust way. This is achieved by first estimating the locally
optimal kernel bandwidth as a functional relationship. Specifically, we propose
Spatially Adaptive Bandwidth Estimation in Regression (SABER), which employs
the mixture of experts consisting of multinomial kernel logistic regression as
a gate and Gaussian process regression models as experts. Using the locally
optimal kernel bandwidths, we deduce an estimate to the local function
complexity by drawing parallels to the theory of locally linear smoothing. We
demonstrate the usefulness of local function complexity for model
interpretation and active learning in quantum chemistry experiments and fluid
dynamics simulations.Comment: 19 pages, 16 figure
Higher order stationary subspace analysis
Non-stationarity in data is an ubiquitous problem in signal processing. The recent stationary subspace analysis procedure (SSA) has enabled to decompose such data into a stationary subspace and a non-stationary part respectively. Algorithmically only weak non- stationarities could be tackled by SSA. The present paper takes the conceptual step generalizing from the use of first and second moments as in SSA to higher order moments, thus defining the proposed higher order stationary subspace analysis procedure (HOSSA). The paper derives the novel procedure and shows simulations. An obvious trade-off between the necessity of estimating higher moments and the accuracy and robustness with which they can be estimated is observed. In an ideal setting of plenty of data where higher moment information is dominating our novel approach can win against standard SSA. However, with limited data, even though higher moments actually dominate the underlying data, still SSA may arrive on par.BMBF, 01IB15001B, Verbundprojekt: ALICE II - Autonomes Lernen in komplexen Umgebungen 2 (Autonomous Learning in Complex Environments 2)BMBF, 01GQ1115, D-JPN Verbund: Adaptive Gehirn-Computer-Schnittstellen (BCI) in nichtstationären UmgebungenDFG, 200318152, Theoretische Konzepte für co-adaptive Mensch-Maschine-Interaktion mit Anwendungen auf BC
Sharing hash codes for multiple purposes
Locality sensitive hashing (LSH) is a powerful tool in data science, which enables sublinear-time approximate nearest neighbor search. A variety of hashing schemes have been proposed for different dissimilarity measures. However, hash codes significantly depend on the dissimilarity, which prohibits users from adjusting the dissimilarity at query time. In this paper, we propose multiple purpose LSH (mp-LSH) which shares the hash codes for different dissimilarities. mp-LSH supports L2, cosine, and inner product dissimilarities, and their corresponding weighted sums, where the weights can be adjusted at query time. It also allows us to modify the importance of pre-defined groups of features. Thus, mp-LSH enables us, for example, to retrieve similar items to a query with the user preference taken into account, to find a similar material to a query with some properties (stability, utility, etc.) optimized, and to turn on or off a part of multi-modal information (brightness, color, audio, text, etc.) in image/video retrieval. We theoretically and empirically analyze the performance of three variants of mp-LSH, and demonstrate their usefulness on real-world data sets